About

This notebook is to facilitate collaboration on building and evaluating models to estimate pharmaceutical product expiration date. The notebook contains:

  • An example reaction mechanism that can be used to *simulate* theoretical data.
  • Several models to predict shelf life based on various assumptions.

To compare models:

  1. Set the theoretical, experimental parameters, and acceptance parameters in indicated area below.
  2. Run the simulation and models *Cell-> Run All*
  3. The model compairsons are updated and summarized in graphs and tables in the notebook.

To contribute, please make a copy of the notebook (File -> Make a Copy), and rename. The following modifications can be made.</b>

  • Plain english comments can be added by clicking on and existing cell and typing, or creating a new cell (*Cell -> Insert*) then changing the cell contents to markdown (*Cell -> Cell Type -> Markdown*).
  • A new model can be added by going to the bottom and appending the code.
  • The reaction mechanism can be changed by modifying the integration cell below.

For general help on using jupyter notebooks, see the jupyter home page
For more information on the impurity prediction project see the project github page.

Example degredation reaction with two pathways

In this example the impurity product, $P$, can either be produced from crystalline drug, $D$, or an amorphous intermediate, $I$, that is more reactive (this example is extracted from Waterman et al. Pharm Res Vol 24 2007 p 783). \begin{equation} \frac{dP}{dt} = k_3 D + k_2 I \end{equation} \begin{equation} \frac{d I}{dt} = k_{1f} D - k_{1r} I - k_2 I \end{equation} \begin{equation} \frac{d D}{dt} = - k_{1f} D + k_{1r} I - k_3 D \end{equation}

Expressing the system in matrix form. \begin{equation} \frac{d}{dt} \begin{bmatrix} P \\ I \\ D \end{bmatrix} = \begin{bmatrix} 0 & k_2 & k_3 \\ 0 & (-k_{1r} - k_2) & k_{1f} \\ 0 & k_{1r} & (-k_{1f}-k_3) \end{bmatrix} \begin{bmatrix} P \\ I \\ D \end{bmatrix} \end{equation}

The rate constants are assumed to follow an Arrhenius temperature dependence: \begin{equation} k = A \exp \left (- \frac{E}{RT} \right ) \end{equation}

Edit the parameters below for the above reaction model.
A : refers to the Arrhenius prefactor ($\textrm{sec}^{-1}$). The subfix is defined by the above model.
E : refers to the Arrhenius activation energy (kcal/mol). The subfix is defined by the above model.
Po, Io, Do : refer to the initial product $P$, intermediate, $I$ and drug, $D$, component concentrations in the above model. The concentrations are defined as the fraction of component relative to the total amount of starting reactant. For example, if the starting formulation has 90% of the drug as crystalline, and 10% amorphous, then the ratios would be $D_o$=0.9, $I_o$=0.1, and $P_o$=0.
Temperatures: refers to the stability testing temperatures.
days: refers to the time points that each sample is measured.
impurity_setpoint: defines the acceptance criteria for the impurity concentration

When finished run menu Cell -> Run Cells. Scroll down to see the updated results.


In [82]:
# kinetic parameters (kcal/mol)
A1f = 1e4
E1f = 22

A1r = 1e4
E1r = 26

A2 = 1e6
E2 = 20

A3 = 1e5
E3 = 21

Po = 0
Io = .1
Do = 0.9

# temperatures (up to 4 different temperatures)
Temperatures = [25, 40, 60, 80]

# time points in days
days = [[0, 7, 14], # days at first temperature
        [0, 5, 10], # days at second temperature
        [0, 3, 7], # days at third temperature
        [0, 1, 5]] # days at fourth temperature

# acceptance criteria for impurity concentration
# final drug concentration = initial drug concentration (1 - impurity setpoint)
# or
# impurity concentration = initial drug concentration * impurity setpoint
impurity_setpoint = 0.05

Here are the main Python imports for solving the ODEs, plotting and other data analysis. Import modules as needed.


In [83]:
# Python module imports
import numpy as np
from matplotlib import pyplot as plt
import pandas as pd
from scipy.integrate import odeint
from IPython.display import Image
from IPython.core.display import HTML 
from scipy.optimize import minimize
from scipy.optimize import curve_fit
import statsmodels.formula.api as sm
%matplotlib inline
from pdb import set_trace

plt.rcParams.update({'figure.figsize':(8,6),
                     'figure.titlesize': 16.,
                     'axes.labelsize':12., 
                     'xtick.labelsize':12., 
                     'legend.fontsize': 12.})

The following numerically integrates the reaction rate to give the exact impurity profile. Nothing needs to be modified here by the user. Scroll down to see calculation outputs and model comparisons.


In [84]:
# gas constant kcal/mol K
R = 0.00198588

# define the domain for the ODEs solution
ndays = np.max(days) + 365*10
dt = 5000.
t = np.arange(0, ndays*(24*3600), dt)
npts = t.shape[0]

# differential equation solution variable
cols = [(T, C) for T in Temperatures for C in ['P','I','D']]
concentrations = pd.DataFrame({col: np.zeros(t.shape[0]) for col in cols})
concentrations['t'] = t

# experimental results
iterables = [(T, d) for i, T in enumerate(Temperatures) for d in days[i]]
index = pd.MultiIndex.from_tuples(iterables, names=['Temperature', 'days'])
results = pd.DataFrame(data=np.zeros(12), index=index, columns=['P'])

# predicted times
predictions = pd.DataFrame(data=np.zeros(len(Temperatures)), index=Temperatures, columns=['Theory'])

# calculate rate constants from the user defined Arrhenius values
def Arrhenius(A, E, T):
    return A*np.exp(-E/(R*T))

# calculate the rate constants for each user defined temperature
k1f = [Arrhenius(A1f, E1f, T+273.) for T in Temperatures]
k1r = [Arrhenius(A1r, E1r, T+273.) for T in Temperatures]
k2 = [Arrhenius(A2, E2, T+273.) for T in Temperatures]
k3 = [Arrhenius(A3, E3, T+273.) for T in Temperatures]

# definition of the derivative of the concentrations vector variable. 
def deriv(y, t, k1f, k1r, k2, k3):
    P, I, D = y
    dPdt = k2*I + k3*D
    dIdt = (-k1r-k2)*I + k1f*D
    dDdt = k1r*I + (-k1f-k3)*D
    dydt = [dPdt, dIdt, dDdt]
    return dydt

# initial conditions (user defined)
yo = [Po, Io, Do]

# call solver for each user defined temperature
for i, T in enumerate(Temperatures):
    yt = odeint(deriv, yo, t, args=(k1f[i], k1r[i], k2[i], k3[i]))
    for j, C in enumerate(['P', 'I', 'D']):
        concentrations[T,C] = yt[:,j]

# find the impurity concentration at the user defined time points 
index = (np.array(days)/float(ndays)*npts).astype('int')
for i, T in enumerate(Temperatures):
    for j, d in enumerate(days[i]):
        results.loc[T,d].P = concentrations[T,'P'].iloc[index[i,j]]
        
# find the time that the reaction will produce the user-defined impurity concentration
for T in Temperatures:
    predictions.loc[T].Theory = t[np.argmin(np.abs(concentrations[T,"P"] - impurity_setpoint))]/(3600*24*365)
    
# define a "private" dictionary to contain all data
_dat = dict(kinetics={'A1f': A1f, 'E1f': E1f, 'A1r': A1r, 'E1r': E1r, 'A2': A2, 'E2': E2, 'A3':A3, 'E3': E3},
            initial={'Po': Po, 'Io': Io, 'Do':Do},
            setpt=impurity_setpoint)

_dat['const'] = {'R': R}
_dat['conc'] = concentrations
_dat['exppts'] = results
_dat['predictions'] = predictions

Below are the calculated results for the user-defined kinetics


In [85]:
t = _dat['conc']['t'].values

t_days = t / (24*3600)
max_days_index = np.argmin(np.abs(t_days - (np.max(_dat['exppts'].index.levels[1].values)+1)))
t_days = t_days[:max_days_index]
c = _dat['conc'].iloc[:max_days_index]
max_conc = c.iloc[:, c.columns.get_level_values(1)=='P'].max().max()

fig = plt.figure(figsize=(14,10))
for i, T in enumerate(Temperatures):
    ax = fig.add_subplot(2,2,i+1)
    ax.plot(t_days, c[T,'P'], color='red', label='P')
    ax.plot(_dat['exppts'].loc[T].index.values, _dat['exppts'].loc[T].P, ls="none", marker='o', color="red")
    ax2 = ax.twinx()
    ax2.plot(t_days, c[T,'I'], color='green', label='I')
    ax2.plot(t_days, c[T,'D'], color='blue', label='D')
    ax.set_ylim([0, max_conc])
    ax2.set_ylim([0,1])
    ax.set_xlabel("time (days)")
    ax.set_ylabel("impurity concentration (%)")
    ax2.set_ylabel("intermediate and drug concentration (%) ")
    ax.legend(['P'], loc="upper left")
    ax2.legend(loc="upper right")
    ax.set_title("%s C"%(Temperatures[i]))


Concentration profiles: The solid lines are the exact concentrations of the components $P$, $I$, $D$ as a function of time calculated from the defined reaction parameters. The filled symbols denote the experimental time points (defined by the user in the parameter list).


In [86]:
_dat['exppts']


Out[86]:
P
Temperature days
25 0 0.000000
7 0.000147
14 0.000295
40 0 0.000000
5 0.000538
10 0.001074
60 0 0.000000
3 0.002224
7 0.005179
80 0 0.000000
1 0.004148
5 0.019836

Theoretical concentration measurements: The above tabulates the theoretical impurity concentrations, $P$, at the measurement time points.


In [87]:
_dat['setpt']


Out[87]:
0.05

In [88]:
t = _dat['conc']['t'].values
c = _dat['conc']

t_days = t / (24*3600)
max_conc = c.iloc[:, c.columns.get_level_values(1)=='P'].max().max()

fig = plt.figure(figsize=(14,10))
for i, T in enumerate(Temperatures):
    ax = fig.add_subplot(2,2,i+1)
    ax.plot(t_days, c[T,'P'], color='red', label='P')
    ax.plot(t_days, [_dat['setpt']]*t_days.shape[0], color='red', ls=':')
    ax.plot(_dat['exppts'].loc[T].index.values, _dat['exppts'].loc[T].P, ls="none", marker='o', color="red")
    ax.set_ylim([0, max_conc])
    ax.set_xlabel("time (days)")
    ax.set_ylabel(r"impurity concentration (%)")
    ax.legend(['P'], loc="upper left")
    ax.set_title("%s C"%(Temperatures[i]))
    ax2 = ax.twinx()
    ax2.plot(t_days, c[T,'I'], color='green', label='I')
    ax2.plot(t_days, c[T,'D'], color='blue', label='D')
    ax2.set_ylim([0,1])
    #ax2.set_ylabel(r"drug concentration")
    ax2.legend(loc="upper right")


Concentration profiles: The plots above show the concentration profiles over a duration sufficient that the impurity concentrations reaches the user defined set point (denoted by the red dashed line).


In [89]:
_dat['predictions']


Out[89]:
Theory
25 8.006247
40 1.556317
60 0.220225
80 0.038844

Shelf life: The above tabulates the theoretical shelf life in years calculated from the user defined kinetic parameters. The value for $25~ ^\circ C$ is the most useful as this is the number that the following models will attempt to predict.

Shelf life prediction assuming zero order kinetics

The actual reaction mechanism is unknown in advance, thus a simple reaction mechanism is assumed. Assuming the reaction kinetics are modeled by a zeroth order on the reactive drug product: \begin{equation} r_P \equiv \frac{dP}{dt} = A \exp(-\frac{E}{RT}) \end{equation} \begin{equation} P = A \exp(-\frac{E}{RT})t \end{equation} \begin{equation} \ln \left ( \frac{P}{t} \right ) = \ln A - \frac{E}{R} \frac{1}{T} \end{equation} The parameters, $A$ and $E$ can readily be obtained by a plot of $\ln \left ( \frac{P}{t} \right ) \equiv \ln (k_P)$ versus $\frac{1}{T}$.
The time is then calculated as: \begin{equation} t = \frac{P}{\hat{A} \exp(-\frac{\hat{E}}{RT})} \end{equation}


In [90]:
fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111)
r = _dat['exppts'].iloc[_dat['exppts'].index.get_level_values(1)!=0].copy()
r['y'] = np.log(np.divide(r['P'].values, r.index.get_level_values(1).astype('f8').values*24*3600+1))
r['x'] = 1./(r.index.get_level_values(0).astype('f8').values + 273)
ax.plot(r.x, r.y, ls='none', marker='o')
ax.set_xlabel(r'$1/T$')
ax.set_ylabel(r'$\ln (P/t)$')

reg = sm.ols(formula="y ~ x", data=r).fit()
pred = reg.params[1]*r.x + reg.params[0]
ax.plot(r.x, pred)

A = np.exp(reg.params[0])
E = -1*reg.params[1]*R

print "regression fit for A = %s "%(A)
print "regression fit for E = %s"%(E)


regression fit for A = 113235.62698 
regression fit for E = 19.9829194279

Regression analysis: The above figure plots the logarithm of the measured impurity concentration divided by time, $\ln (P/t)$, against the reciprical temperature, $1/T$ (filled circles), which would be linear for a zero-order kinetics. The line is the result of a linear fit to the data.


In [91]:
print reg.summary()


                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                  1.000
Method:                 Least Squares   F-statistic:                 1.464e+05
Date:                Tue, 03 May 2016   Prob (F-statistic):           2.15e-14
Time:                        12:59:34   Log-Likelihood:                 23.586
No. Observations:                   8   AIC:                            -43.17
Df Residuals:                       6   BIC:                            -43.01
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     11.6372      0.082    142.596      0.000      11.438      11.837
x          -1.006e+04     26.301   -382.585      0.000   -1.01e+04   -9998.144
==============================================================================
Omnibus:                        5.017   Durbin-Watson:                   1.772
Prob(Omnibus):                  0.081   Jarque-Bera (JB):                1.518
Skew:                          -1.050   Prob(JB):                        0.468
Kurtosis:                       3.379   Cond. No.                     5.08e+03
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 5.08e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

The above regression provides the kinetic parameters necessary to calculate the predicted shelf-life assuming the mechanism is zeroth-order.


In [92]:
temp = _dat['const']['R']*(_dat['predictions'].index.astype('f8').values+273)
k = A * np.exp(-E/temp)
_dat['predictions']['zero_order'] = _dat['setpt'] / k * 1/(3600*24*365)

In [93]:
_dat['predictions']


Out[93]:
Theory zero_order
25 8.006247 6.470045
40 1.556317 1.282699
60 0.220225 0.186024
80 0.038844 0.033576

Table of predicted results The above table compares the theoretical shelf-life in years (calculated from the exact reaction mechanism and user defined parameters) to the predicted shelf-life assuming zero order kinetics.

Shelf life prediction assuming first-order kinetics

Assuming a reaction mechanism as: In this prediction model, the impurity production rate is assumed to be first-order in the total drug concentratjion. \begin{equation} r_P \equiv \frac{dP}{dt} = k D_T \end{equation} where $D_T$ is the total drug concentration, $P$ is the impurity concentration and $k$ is the kinetic parameters, which is assumed to have an Arrehnius temperature dependence. \begin{equation} k = A \exp \left ( -\frac{E}{RT} \right ) \end{equation} The impurity concentration is linked to the drug concentration by a mass balance \begin{equation} D = (D_o + I_o) - P = D_{To} - P \end{equation} where $D_o$ is the starting crystalline drug concentration, $I_o$ is the starting amorphous drug concentration, and $D_{To}$ is the starting total drug concentration. Substituting, \begin{equation} \frac{dP}{dt} = k [D_{To} - P] \end{equation} The first-order ODE can be solved with the integrating factor. \begin{equation} \mu(t) = e^{\int k dt} = e^{kt} \end{equation} \begin{equation} \frac{d}{dt} \left [P e^{kt} \right ] = e^{kt} kD_{To} \end{equation} Integrating \begin{equation} \left [P e^{kt} \right ] = \int_0^t e^{kt} kD_{To} = D_{To} e^{kt} - D_{To} = D_{To} (e^{kt} - 1) \end{equation} Rearranging: \begin{equation} P = D_{To} (1 - e^{-k t}) = D_{To} \left [ 1 - \exp \left (\exp \left (- A \frac{E}{RT} t \right ) \right ) \right ] \end{equation} The above equation links the experimental data (impurity concentration, $P$, at time points, $t$) to the reaction kinetic parameters (i.e. Arrhenius parameters $A$ and $E$).

Method 1 - Linear Regression

One method to obtain the Arrhenius parameters is to a stepwise calculation. First, the impurity concentration experiments conducted at the same temperature, $T$, the kinetic parameter, $k$, at the temperature $T$ can be obtained by plotting log of the drug concentration over time (recall that $P = D_{To} - D$). \begin{equation} D_{To} - D = D_{To} (1-e^{-k t}) \end{equation} \begin{equation} D = D_{To} e^{-k t} \end{equation} \begin{equation} \ln D = \ln D_{To} - k t \end{equation} \begin{equation} \ln \left ( \frac{D}{D_{To}} \right ) = - k t \end{equation}


In [94]:
D = Do + Io - _dat['exppts']['P']
data = np.log(D/(Do+Io)).to_frame('ln(D/Do)')
data['t'] = data.index.get_level_values(1)*24*3600
data['k'] = [0]*data['t'].shape[0]

colors = ['blue','green','red','orange']
fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111)
for i, T in enumerate(_dat['exppts'].index.levels[0]):
    ax.plot(data.loc[T]['t'], data.loc[T]['ln(D/Do)'], ls='none', marker='o', color=colors[i])
    slope = sm.OLS(data.loc[T]['ln(D/Do)'], data.loc[T]['t']).fit().params[0]
    data.loc[(T,slice(None)),'k'] = -1 * slope
    ax.plot(data.loc[T]['t'], data.loc[T]['t']*slope, 
             ls='--', color=colors[i], label="%.2e 1/s"%(-1*slope))
ax.set_ylabel(r'ln $\left ( D/D_{To} \right )$')
ax.set_xlabel(r'$t$ (sec)')
plt.legend(loc='best')


Out[94]:
<matplotlib.legend.Legend at 0xa746a48c>

Figure $\ln \left ( \frac{D}{D_o} \right ) ~\mathrm{vs}~t$: to estimate the the rate constant at each experimental temperature, which is subsequently used to estimate the activation energy and collision factor for initial guess in the non-linear regression.


In [95]:
data


Out[95]:
ln(D/Do) t k
Temperature days
25 0 0.000000 0 2.438335e-10
7 -0.000147 604800 2.438335e-10
14 -0.000295 1209600 2.438335e-10
40 0 0.000000 0 1.243739e-09
5 -0.000538 432000 1.243739e-09
10 -0.001074 864000 1.243739e-09
60 0 0.000000 0 8.585526e-09
3 -0.002227 259200 8.585526e-09
7 -0.005192 604800 8.585526e-09
80 0 0.000000 0 4.644401e-08
1 -0.004156 86400 4.644401e-08
5 -0.020035 432000 4.644401e-08

Table of the kinetic parameter, $k$ regressed from data at the different experimental temperatures.

Once the kinetic parameter, $k$, is estimated at the different temperatures, the Arrenius parameters, $A$ and $E$, can be obtained by plotting the logarithm of the rate constant at each experimental temperature against the reciprical temperature. \begin{equation} \ln k = \ln A - \frac{E}{R} \frac{1}{T} \end{equation}


In [96]:
_dat['exppts']


Out[96]:
P
Temperature days
25 0 0.000000
7 0.000147
14 0.000295
40 0 0.000000
5 0.000538
10 0.001074
60 0 0.000000
3 0.002224
7 0.005179
80 0 0.000000
1 0.004148
5 0.019836

In [97]:
# variable data is used from above cell
df = pd.DataFrame({'x': 1./(data.index.levels[0]+273).values, 'y': np.log(data.loc[(slice(None),0),'k'])})

reg = sm.ols(formula='y~x', data=df).fit()
A = np.exp(reg.params['Intercept'])
E = -1 * R * reg.params['x']

fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111)
ax.plot(df.x, df.y, marker='o', ls='none', color='blue', label='data')
ax.plot(df.x, df.x*reg.params['x'] + reg.params['Intercept'], ls="--", color='blue', label='fit')
ax.set_xlabel(r"$1/T$ (1/K)")
ax.set_ylabel(r"log $k$")
ax.legend()

print "regression fit for A = %.2e"%A
print "regression fit for E = %.2f"%E


regression fit for A = 1.06e+05
regression fit for E = 19.94

Figure $\ln k ~\mathrm{vs}~\frac{1}{T}$: to estimate the activation energy and the collision factor to use as initial guesses in the subsequent nonlinear regression.

The estimated Arrhenius parameters can now be used to estimate shelf-life.


In [98]:
temp = _dat['const']['R']*(_dat['predictions'].index.astype('f8').values+273)
k = A * np.exp(-E/temp)
_dat['predictions']['first_order'] = _dat['setpt'] / k * 1/(3600*24*365)
_dat['predictions']


Out[98]:
Theory zero_order first_order
25 8.006247 6.470045 6.450609
40 1.556317 1.282699 1.282839
60 0.220225 0.186024 0.186738
80 0.038844 0.033576 0.033817
Method 2 - non-linear regression

According to a paper by Fung (Statistical prediction of drug stability based on nonlinear parameter estimation, Fung et al Journal Pharm Sci Vol 73 No 5 pg 657-662 1984), improved confidence can be obtained by performing a non-linear regression of all the data simuntaneously, instead of the stepwise approach. In Fung's paper, they pre-assumed a shelf-life defined by 10% degredation of the product, which simplifies the estimation. However, the regression done here will be performed without the assumption.

\begin{equation} D = D_{To} e^{-k t} \end{equation}\begin{equation} \ln D = \ln D_{To} - kt \end{equation}\begin{equation} \ln \left ( \frac{D}{D_{To}} \right ) = - A \exp \left ( -\frac{E}{RT} \right ) t \end{equation}\begin{equation} \left ( \frac{\ln \left ( \frac{D}{D_{To}} \right )}{t} \right ) + A \exp \left ( -\frac{E}{RT} \right ) = 0 \end{equation}


Thus, the Arrhenius parameters are found by finding the roots to the above equation, where $A$ and $E$ are the variables.


In [99]:
D = Do + Io - _dat['exppts']['P'].values
df = pd.DataFrame(np.log(D/(Do+Io)), columns=['D*'])
df['t'] = _dat['exppts'].index.get_level_values(1)*24*3600
df['T'] = _dat['exppts'].index.get_level_values(0) + 273.
df = df.loc[df['t']!=0]
def error(p):
    A = p[0]
    E = p[1]
    return np.sum((df['D*']/df['t'] + A * np.exp(-1. * E/(R*df['T'])))**2)

The optimization will be performed over a range of initial conditions to inspect convergence problems.


In [100]:
xo = np.array([A,E])
opt = []
for d in np.arange(.1,1.5,.1):
    opt.append(minimize(error, xo*d, tol=1e-30))

Plot the square errors for a series of initial conditions.


In [101]:
e2 = np.array([error(r.x) for r in opt])

fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(e2)
ax.set_ylabel(r"$\sum \mathrm{error}^2$")
ax.set_xlabel("initial condition index")

A, E = (opt[e2.argmin()]).x
print "optimal A = %.2e 1/s" %A
print "optimal E = %.2f kcal/mol K" %E


optimal A = 9.58e+04 1/s
optimal E = 19.87 kcal/mol K

Figure of square error indicates that the objective function likely has some very shallow gradients near the minimum, thus inhibiting convergence to a unique value.


In [102]:
temp = _dat['const']['R']*(_dat['predictions'].index.astype('f8').values+273)
k = A * np.exp(-E/temp)
_dat['predictions']['first_order_nonlinear'] = impurity_setpoint / k * 1/(3600*24*365)
_dat['predictions']


Out[102]:
Theory zero_order first_order first_order_nonlinear
25 8.006247 6.470045 6.450609 6.270671
40 1.556317 1.282699 1.282839 1.255067
60 0.220225 0.186024 0.186738 0.184097
80 0.038844 0.033576 0.033817 0.033565

Table of precition comparison suggests that zero_order was actually the most accurate of the models. It should be noted that the convergence of the non-linear regression is questionable. The under prediction of shelf-life is to be expected, because of the faster reaction of the amorphous form that gets consumed early in the process.

King, Kung, Fung method

As illustrated above, there is a weak dependence of the objective function on the kinetic parameters near the minimum. Consequently, the optimization problem has difficulty converging to an absolute minimum. The paper by King et al suggests some modification to the equations prior to the non-linear regression. First, the concentration profiles are written as a function of the temperature dependent reaction rate constant, $k$, for zero, first and second order reactions.

\begin{align*} & \mathrm{zero~order} & {\Huge |} & \mathrm{first~order} & {\Huge |} & \mathrm{second~order} & \\ & C = C_o - kt & {\Huge |} & C = C_o \exp(-kt) & {\Huge |} & C = \frac{C_o}{1 + C_okt} & \end{align*}

Here, $Co$ is the total starting drug concentration and $C$ is the total drug concentration at time, $t$. The rate constant is related to temperature via the Arrhenius equation. \begin{equation} k = Ae^{-\frac{E}{RT}} \end{equation} The Arrhenius relation could be substituted into the above concentration equations, and a non-linear regression be applied to estimate the parametes $A$ and $E$. However, as illustrated in the previous method example above, the gradients are very small in the vicinity of the global minimum, which makes the parameter estimation difficult. Part of the problem seems to aries because the rate constant, $A$ is on the order of $10^5$, whereas the activation energy is on the order of $10^1$. The disparity in size of the parameters, may lead to some instabilities. Fung et. al eliminate the collision parameter, $A$ by using the Arrhenius equation at a temperature of 298 K. \begin{equation} A = k_{298} e^{\frac{E}{R*298}} \end{equation} Substitution gives: \begin{equation} k = k_{298} e^{\frac{E}{R*298}} e^{-\frac{E}{RT}} = k_{298} e^{\frac{E}{R} \left ( \frac{1}{298} - \frac{1}{T} \right ) } \end{equation}

The rate constant at 298 K, $k_{298}$, is then exressed as a function of the user defined concentration set-point to define the product shelf-life, $C_f$, and the time it will take the drug to reach the user set-point at 298 K, $t_{298}$. \begin{equation} k_{298} = f(C_f, t_{298}) \end{equation} Here, $f$ is a function that depends on the reaction order. The shelf-life concentration set-point can be exressed in terms of the variable impurity-setpoint, $S_p$ as: \begin{equation} S_p = \frac{C_o - C_f}{C_o} \rightarrow C_f = C_o(1-S_p) \end{equation} Rearranging the above concentration equations for zero, first and second order reactions and defining the parameters using the 298 K subscript gives: \begin{align*} & \mathrm{zero~order} & {\Huge |} & \mathrm{first~order} & {\Huge |} & \mathrm{second~order} & \\ & k_{298} = \frac{C_o - C_f}{t^*_{298}} & {\Huge |} & k_{298} = -\frac{ln \frac{C_f}{C_o}}{t^*_{298}} & {\Huge |} & k_{298} = \frac{1}{t^*_{298}} \left ( \frac{1}{C_f} - \frac{1}{C_o} \right ) \\ & k_{298} = \frac{S_p }{t^*_{298}} & {\Huge |} & k_{298} = -\frac{ \ln (1 - S_p) }{t^*_{298}} & {\Huge |} & k_{298} = \frac{ 1 }{C_o t^*_{298}} \left ( \frac{S_p}{1-S_p} \right ) \end{align*} Substituting the expression for the rate constant at 298 K, $k_{298}$, into the concentration equations gives:

\begin{align*} & \mathrm{zero~order} & {\Huge |} & \mathrm{first~order} & {\Huge |} & \mathrm{second~order} \\ & C = C_o \left [ 1 - \frac{S_p}{t^*_{298}} e^{\frac{E}{R} \left ( \frac{1}{298} - \frac{1}{T} \right ) } t \right ] & {\Huge |} & C = C_o \exp \left ( \frac{\ln (1-S_p)}{t^*_{298}} e^{\frac{E}{R} \left ( \frac{1}{298} - \frac{1}{T} \right ) } t \right ) & {\Huge |} & C = \frac{C_o}{1 + \frac{ 1 }{t^*_{298}} \left ( \frac{S_p}{1-S_p} \right ) e^{\frac{E}{R} \left ( \frac{1}{298} - \frac{1}{T} \right ) } t } \\ \end{align*}

In the above relationships, the fit parameters are the activation energy, $E$, and the product shelf-life at 298 K, $t^*_{298}$.

Define the variables that will be used for the subsequent non-linear regression analysis.


In [103]:
Co = Do + Io
C = Co - _dat['exppts']['P'].values
t = _dat['exppts'].index.get_level_values(1).values*24*3600.
T = _dat['exppts'].index.get_level_values(0).values + 273.
C = C[t!=0]
T = T[t!=0]
t = t[t!=0]
Sp = _dat['setpt']
R = _dat['const']['R']

Zero order (method of King, Kung, Fung)

Define the objective function for a first order reaaction.


In [104]:
def error(p):
    t_298 = p[0]
    E = p[1]
    k = np.exp(E/R * (1/298. - 1/T))
    k = Sp / t_298 * k
    err = Co * (1 - k * t)
    return np.sum(err**2)

Perform the non-linear regression over a range of initial conditions.


In [105]:
t_298 = 10.*365*24*3600
E = 10.
xo = np.array([t_298,E])
opt = []
for d in np.arange(.1,10,.1):
    opt.append(minimize(error, xo*d, tol=1e-30))

Plot the error of the regression as a function of the initial conditions.


In [106]:
e2 = np.array([error(r.x) for r in opt])

fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(e2)
ax.set_ylabel(r"$\sum \mathrm{error}^2$")
ax.set_xlabel("initial condition index")

t_298, E = (opt[e2.argmin()]).x
print "optimal t_298 = %.1f y" %(t_298/(365*24*3600))
print "optimal E = %.2f kcal/mol K" %E


optimal t_298 = 26.0 y
optimal E = 40.72 kcal/mol K

Figure of sum of residual erros illustrates that the expressions for the objective function has very shallow gradients near the minimum, which results in difficult in converging to a global minimum.


In [107]:
_dat['predictions']['KKF_zero'] = np.nan
_dat['predictions']['KKF_zero'].loc[25] = t_298 / (365*24*3600)

First order (method of King, Kung, Fung)

Define the objective function for a first order reaaction.


In [108]:
def error(p):
    t_298 = p[0]
    E = p[1]
    k = np.exp(E/R * (1/298. - 1/T))
    k = np.log(1-Sp) / t_298 * k
    err = Co * np.exp(k*t) - C
    return np.sum(err**2)

Perform the non-linear regression over a range of initial conditions.


In [109]:
t_298 = 10.*365*24*3600
E = 10.
xo = np.array([t_298,E])
opt = []
for d in np.arange(.1,10,.1):
    opt.append(minimize(error, xo*d, tol=1e-30))

Plot the error of the regression as a function of the initial conditions.


In [110]:
e2 = np.array([error(r.x) for r in opt])

fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(e2)
ax.set_ylabel(r"$\sum \mathrm{error}^2$")
ax.set_xlabel("initial condition index")

t_298, E = (opt[e2.argmin()]).x
print "optimal t_298 = %.1f y" %(t_298/(365*24*3600))
print "optimal E = %.2f kcal/mol K" %E


optimal t_298 = 22.0 y
optimal E = 24.52 kcal/mol K

Figure of sum of residual erros illustrates that the expressions for the objective function has very shallow gradients near the minimum, which results in difficult in converging to a global minimum.


In [111]:
_dat['predictions']['KKF_first'] = np.nan
_dat['predictions']['KKF_first'].loc[25] = t_298 / (365*24*3600)

Second order (method of King, Kung, Fung)

Define the objective function for a first order reaaction.


In [112]:
def error(p):
    t_298 = p[0]
    E = p[1]
    k = np.exp(E/R * (1/298. - 1/T))
    k = 1 / t_298 * (Sp/(1-Sp)) * k
    err = Co / (1 + k*t)
    return np.sum(err**2)

Perform the non-linear regression over a range of initial conditions.


In [113]:
t_298 = 10.*365*24*3600
E = 10.
xo = np.array([t_298,E])
opt = []
for d in np.arange(.1,10,.1):
    opt.append(minimize(error, xo*d, tol=1e-30))

Plot the error of the regression as a function of the initial conditions.


In [114]:
e2 = np.array([error(r.x) for r in opt])

fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(e2)
ax.set_ylabel(r"$\sum \mathrm{error}^2$")
ax.set_xlabel("initial condition index")

t_298, E = (opt[e2.argmin()]).x
print "optimal t_298 = %.1f y" %(t_298/(365*24*3600))
print "optimal E = %.2f kcal/mol K" %E


optimal t_298 = 30.0 y
optimal E = 358.24 kcal/mol K

Figure of sum of residual erros illustrates that the expressions for the objective function has very shallow gradients near the minimum, which results in difficult in converging to a global minimum.


In [115]:
_dat['predictions']['KKF_second'] = np.nan
_dat['predictions']['KKF_second'].loc[25] = t_298 / (365*24*3600)

In [116]:
_dat['predictions']


Out[116]:
Theory zero_order first_order first_order_nonlinear KKF_zero KKF_first KKF_second
25 8.006247 6.470045 6.450609 6.270671 26 22 30
40 1.556317 1.282699 1.282839 1.255067 NaN NaN NaN
60 0.220225 0.186024 0.186738 0.184097 NaN NaN NaN
80 0.038844 0.033576 0.033817 0.033565 NaN NaN NaN